This article revisits the theme of an article I wrote for Simple-Talk two years ago called Reading and Writing Files in SQL Server using T-SQL , which to my amazement, has proved to be the most popular article I’ve ever written.
In this belated second part (I’ll assume you have SQL Server 2005 or 2008 most of the time) We’re going to show you the TSQL you can use to
- Get a List of Files in a directory/folder, with, if you with, the subfolders and attributes such as file size, revision date, or short-form.
- Access the SHELL.APPLICATION to do file operations
- Read data from file into a TSQL variable
- Read data into a table, each line in a table row
- Write data from a TSQL variable into a file
- Write the String-based results of a SQL Expression into a file
These are simple operations, but it is some of the simplest operations in a database that can prove to be the most taxing: By way of contrast when you are working in DOS, the DIR or ls commands are one of the first commands that one learns. The operation of getting data to or from files is so easy; but not always so in SQL Server. This is because SQL Server is usually the worst place to do such an operation. Very occasionally, it isn’t. In the hard reality of life in an IT department, it is sometimes the only practical option. For the majority of operations, it is the obvious choice to use SSIS/DTS, but not invariably.
The first problem you face is security. The Database User that runs these routines that I show will need ‘special powers’, powers that can’t possibly be given to general users for reasons of security .It also requires you to open up hatches in your database security ‘Surface Area Configuration’, that are normally battened down, such as OLE Automation, and use of the command line. (you’ll probably need to use the utility called ‘Surface Area Configuration for Features’) We’re not going to cover these issues here, or show you ways to solve the problem of keeping your database secure. Instead, refer to The SQL Server Security Workbench and The Process Delegation workbench/
Getting a List of Files in a directory/folder.
Why should you ever want to get a list of files in SQL Server, perhaps with their properties, such as creation date or size as well? Most commonly, it is when one wants to archive off old backup files as part of a maintenance program, or delete old log-shipping files. I often find myself needing a list of files from a directory and all its subdirectories when I’m importing HTML or data files into a database. There are quite a few occasions where one wants an automatic data feed based on files appearing in a directory. I’m routinely having to pick up text-based log files of various description, and import them into tables. I’ve even used routines like these to index up an intranet store of documents for fast access. All this requires you to be able to check what’s on disk. read stuff in, and sometimes write stuff out., preferably at lightening speed. It is not always as simple as one might wish.
The most obvious way of getting a list of files into TSQL as a table is to use the following technique. I give it to you in a stored procedure that you will find is of little use except where it is part of a longer process. I’ll explain why later.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
-------------------------------------------------------------------- IF OBJECT_ID (N'dbo.ListPaths') IS NOT NULL DROP PROCEDURE dbo.ListPaths GO CREATE PROCEDURE dbo.ListPaths @FileSpec VARCHAR(2000), @order VARCHAR (80) = '/O-D'--sort by date time oldest first /* produce a table with a single column consisting of the path of every file, including subdirectories, of the directory specified You can specify the order in the order parameter N By name (alphabetic) S By size (smallest first) E By extension (alphabetic) D By date/time (oldest first) G Group directories first - Prefix to reverse order */ AS DECLARE @myfiles TABLE (MyID INT IDENTITY(1,1) PRIMARY KEY, FullPath VARCHAR(2000)) DECLARE @CommandLine VARCHAR(4000) IF @order IS NOT NULL -- abort if the order is silly BEGIN SELECT @CommandLine =LEFT('dir "' + @FileSpec + '" /A-D /B /S '+@order,4000) /* /A Displays files with specified attributes. attributes D Directories R Read-only files H Hidden files A Files ready for archiving S System files - Prefix meaning not /B Uses bare format (no heading information or summary). /S Displays files in specified directory and all subdirectories. */ INSERT INTO @MyFiles (FullPath) EXECUTE xp_cmdshell @CommandLine DELETE FROM @MyFiles WHERE fullpath IS NULL OR fullpath='File Not Found' END SELECT fullpath FROM @MyFiles ------------------------------------------------------------- GO EXECUTE listpaths 'C:\Workbench\*.txt' |
…and you get an error saying ‘An INSERT EXEC statement cannot be nested
‘
There are other problems to tackle. You only get the full path when you use the /S switch (to list subdirectories) and the DIR command goes doolally if you use the /S switch and if it can’t find the directory spec. Also, if you want to use this to archive all backups made over three weeks ago then you’re stuffed. the bare command doesn’t pass back the file sizes, or dates, and the information can’t be got reliably from the DIR command anyway for a variety of reasons (varying Date Settings, DOS version differences in the column settings, not all useful info there etc). Another problem is that you usually want the directory too without having to parse the filename. If you omit the /B /S, you get the filename, with the /B you get the whole path: Hassle.
There is one problem we can solve pretty smartly. It is due to the fact that it is a stored procedure rather than a function. insert into ...Exec
cannot be nested. You cannot use the obvious solution of doing an xp_cmdshell
in a function because insert into…execute isn’t allowed in a function. You can do an OPENROWSET
to do the EXECUTE
, but the OPENROWSET
only takes string literals, not variables, so you cannot specify the directory to get a directory listing from! This requires patience and cunning. Our first technique will be to use an XML output variable.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
-------------------------------------------------------------------- IF OBJECT_ID (N'dbo.ListPathsXML') IS NOT NULL DROP PROCEDURE dbo.ListPathsXML GO CREATE PROCEDURE dbo.ListPathsXML @FileSpec VARCHAR(2000), @order VARCHAR (80) = '/O-D',--sort by date time oldest first @xmlFileList XML OUTPUT AS DECLARE @myfiles TABLE (MyID INT IDENTITY(1,1) PRIMARY KEY, FullPath VARCHAR(2000)) DECLARE @CommandLine VARCHAR(4000) IF @order IS NOT NULL -- abort if the order is silly BEGIN SELECT @CommandLine =LEFT('dir "' + @FileSpec + '" /A-D /B /S '+@order,4000) INSERT INTO @MyFiles (FullPath) EXECUTE xp_cmdshell @CommandLine DELETE FROM @MyFiles WHERE fullpath IS NULL OR fullpath = 'File Not Found' END SET @xmlFileList = (SELECT fullpath FROM @MyFiles FOR XML PATH('thefile'), ROOT('thefiles'), TYPE) GO |
We can use the routine like this…
1 2 3 4 5 6 |
DECLARE @FileList XML EXECUTE ListPathsXML 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG', DEFAULT , @XMLFileList = @FileList OUTPUT SELECT x.thefile.value('fullpath[1]', 'varchar(2000)') AS [path] FROM @FileList.nodes('//thefiles/thefile') AS x ( thefile ) ------------------------------------------------------------------------ |
so, fired with enthusiasm, we construct a solution that allows us to do both subdirectory and simple directory listings, and tells us the directory that every file was found in. Just to show off a bit, we can get it just to list the directories.
The code for the procedure spGetFilePaths
is here. You need to click here to see the code! And then, maybe you could use it something like this, to get a listing of files
1 2 3 4 5 6 7 8 9 10 |
DECLARE @FileList XML, @FilePath VARCHAR(2000) SELECT @FilePath='C:\Program Files\Microsoft SQL Server' EXECUTE spGetFilePaths @FilePath, '*.*', @subdirectories = >1, @XMLFileList = @FileList OUTPUT SELECT directoryList.* FROM (SELECT @filepath AS BaseDirectory, x.thefile.value('thePath[1]', 'varchar(200)') AS Fullpath, x.thefile.value('filename[1]', 'varchar(200)') AS [filename], x.thefile.value('path[1]', 'varchar(200)') AS [Subdirectory] FROM @FileList.nodes('//thefiles/thefile') AS x ( thefile ))directoryList |
Let’s try a more sophisticated approach to try to encapsulate it in a function. We can use OLE automation in functions but there is a big snag. The File System Object (FSO) doesn’t allow file listings from xp_oaMethod
since the item property of a folder doesn’t have an index value. Strangely, Microsoft also have a slightly flawed method that uses SHELL.APPLICATION.
This last one works, more or less (It hates it if the directory you specify doesn’t exist, so we have to program around that)
Accessing the SHELL.APPLICATION to get a directory listing.
Here is a function that uses SHELL.APPLICATION
automation. You’ll have to click here to see the source code!
As you can see, it makes the whole process a lot simpler, but it is very slow.
1 2 3 4 5 |
--list all subdirectories beginning with M from "c:\program files" */ SELECT [path] FROM dbo.dir('c:\program files') WHERE name LIKE 'm%' AND IsFolder =1 --list every file and subdirectory in the WINNT directory SELECT * FROM dbo.dir('c:\winnt') |
This routine returns a table representing all the items in a folder. It takes as parameter the path to the folder. It does not take wildcards in the same way as a DIR command. Instead, you would be expected to filter the Table results of the function using SQL commands.
Notice that the size of the item (e.g. file) is not returned by this function. It is just not in the list of properties. Microsoft can be cussed at times.
This function uses the Windows Shell COM object via OLE automation. It opens a folder and iterates though the items listing their relevant properties. You can use the SHELL object to do all manner of things such as printing, copying, and moving filesystem objects, accessing the registry and so on. Powerful medicine.
This seems great. But the problem is that it doesn’t iterate over subdirectories. Oops. Also, it is a bit short on properties. Still it does for most purposes. It isn’t a complete substitute for using DIR because it is s…l…o..w. Most of the time, this doesn’t matter much, but just occasionally it is a pain.
So we now do a function that addresses most of these problems except for the slowness.
You will have to click here to see the source code for the dbo.files
function. This is an example of how it might be used.
1 2 |
SELECT SUM(size), COUNT(*) FROM dbo.files ('C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG', 1,1) |
Getting Text into a TSQL Variable
Now we have solved the awful problem of getting the names of the files we want, we can have some fun. It is now easy to read a file into SQL Server. There are a number of ways of getting stuff in to SQL Server but this is the simplest.
1 2 3 |
DECLARE @LotsOfText VARCHAR(MAX) SELECT @LotsOfText = BulkColumn FROM OPENROWSET(BULK 'C:\workbench\MyBigfile.txt', SINGLE_BLOB) AS x |
Yes, but some silly old moo in Microsoft has cobbled the OPENROWSET
function so it won’t take a variable so unless you hard-wire your OPENROWSET
commands to literals rather than variables (of course you don’t) you can’t use it.
However, there is a cunning trick you can use with sp_ExecuteSQL
to get round this. It is easier to demonstrate than explain.
So let’s try it out by writing a routine to find a particular string, ‘Explosive’ in this example, in a whole lot of files. We’ll use our ListPathsXML procedure from our file-listing armoury just because we only need the paths, but no other file information and we want it fast.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
DECLARE @LotsOfText NVARCHAR(MAX), @ii INT, @iiMax INT, @File VARCHAR(2000), @Command NVARCHAR(4000) DECLARE @files TABLE (MyID INT IDENTITY(1,1) PRIMARY KEY, [Path] VARCHAR(2000)) DECLARE @FileList XML EXECUTE ListPathsXML 'C:\MyDocuments\MyData.TXT', DEFAULT , @XMLFileList = @FileList OUTPUT INSERT INTO @files(path) SELECT x.thefile.value('fullpath[1]', 'varchar(2000)') AS [path] FROM @FileList.nodes('//thefiles/thefile') AS x ( thefile ) --don't look at the current errorlog! SELECT @ii=1, @iiMax=MAX(MyID) FROM @Files WHILE @ii<=@iiMax BEGIN SELECT @File= [path] FROM @files WHERE MyID=@ii SELECT @command='SELECT @LotsOfText = BulkColumn FROM OPENROWSET(BULK '''+REPLACE(@FILE,'''','''''')+''', SINGLE_BLOB) AS x' EXECUTE sp_ExecuteSQL @command, N'@lotsOfText nvarchar(max) output ',@lotsoftext output IF PATINDEX('%Explosive%',@LotsOfText)>0 SELECT 'Error report found in file "'+@file+'"' SELECT @ii=@ii+1 END SELECT CAST(@ii AS VARCHAR(5))+' files checked' |
Well, that is fine, but sometimes you want something that is going to bring the text from a file line-by-line so you can process it in a table, line-by-line.
Getting Text into a Table
We’ll illustrate this with a version of the routine that reads in all the errorlog
files for an instance line-by-line as a table, and prints out each line that has the word ‘error’ in it. This uses a new procedure called spLoadTextFromAFile
which does just that. It loads the text from a file line buy line and returns it as a result. You’ll have to see the code by clicking here. And here is the procedure in use.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
DECLARE @LotsOfText NVARCHAR(MAX), @ii INT, @iiMax INT, @File VARCHAR(2000) DECLARE @files TABLE (MyID INT IDENTITY(1,1) PRIMARY KEY, [Path] VARCHAR(2000)) DECLARE @lines TABLE (MyID INT IDENTITY(1,1) PRIMARY KEY, [line] NVARCHAR(MAX)) DECLARE @FileList XML EXECUTE ListPathsXML 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\LOG\ERRORLOG.*', DEFAULT , @XMLFileList = @FileList OUTPUT INSERT INTO @files(path) SELECT x.thefile.value('fullpath[1]', 'varchar(2000)') AS [path] FROM @FileList.nodes('//thefiles/thefile') AS x ( thefile ) DELETE FROM @files WHERE REVERSE(path) LIKE 'golrorre%' --don't look at the current errorlog! SELECT @ii=1, @iiMax=MAX(MyID) FROM @Files WHILE @ii<=@iiMax BEGIN SELECT @File= [path] FROM @files WHERE MyID=@ii INSERT INTO @lines(line) EXECUTE spLoadTextFromAFile @file, @Unicode=1 SELECT @ii=@ii+1 END SELECT MyID AS [line Number], Line, @file FROM @lines WHERE line LIKE '%Error%' |
Write data from a TSQL variable into a file
If you want to write text out from a variable to a file, here is something that will do the trick.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 |
----------------------------------------------------------------- IF OBJECT_ID (N'dbo.spSaveTextToFile') IS NOT NULL DROP PROCEDURE dbo.spSaveTextToFile GO CREATE PROCEDURE spSaveTextToFile @TheString VARCHAR(MAX), @Filename VARCHAR(255), @Unicode INT=0 AS SET NOCOUNT ON DECLARE @MySpecialTempTable VARCHAR(255) DECLARE @Command NVARCHAR(4000) DECLARE @RESULT INT --firstly we create a global temp table with a unique name SELECT @MySpecialTempTable = '##temp' + CONVERT(VARCHAR(12), CONVERT(INT, RAND() * 1000000)) --then we create it using dynamic SQL, & insert a single row --in it with the MAX Varchar stocked with the string we want SELECT @Command = 'create table [' + @MySpecialTempTable + '] (MyID int identity(1,1), Bulkcol varchar(MAX)) insert into [' + @MySpecialTempTable + '](BulkCol) select @TheString' EXECUTE sp_ExecuteSQL @command, N'@TheString varchar(MAX)', @TheString --then we execute the BCP to save the file SELECT @Command = 'bcp "select BulkCol from [' + @MySpecialTempTable + ']' + '" queryout ' + @Filename + ' ' + CASE WHEN @Unicode=0 THEN '-c' ELSE '-w' END + ' -T -S' + @@servername EXECUTE @RESULT= MASTER..xp_cmdshell @command, NO_OUTPUT EXECUTE ( 'Drop table ' + @MySpecialTempTable ) RETURN @result GO |
We’ll first read text in and then write it back out again to a different file just to get confidence that nothing gets corrupted. We’ll use a handy large text file for this. In effect, we are copying the file.
1 2 3 4 |
DECLARE @LotsOfText VARCHAR(MAX) SELECT @LotsOfText = BulkColumn FROM OPENROWSET(BULK 'C:\workbench\Moby-Dick.txt', SINGLE_BLOB) AS x EXECUTE spSaveTextToFile @lotsOfText, 'C:\workbench\Moby-Dick-copy.txt',0,0 |
Write the String-based results of a SQL Expression into a file
We can do other things with the same idea we’ve just seen. The following procedure even allows you to write the results of a SQL Statement to a file as long as the result is a single string column. This sort of idea can be adapted to a number of purposes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
IF OBJECT_ID (N'dbo.spSaveTextResultToFile') IS NOT NULL DROP PROCEDURE dbo.spSaveTextResultToFile GO PROCEDURE spSaveTextResultToFile @TheSQL VARCHAR(MAX), @Filename VARCHAR(255), @Unicode INT=0 /* e.g. spSaveTextResultToFile 'Select logstring+'', ''+convert(char(11),insertionDate,113) from activitylog', 'C:\workbench\Logreport.txt' */ AS SET NOCOUNT ON DECLARE @MySpecialTempTable VARCHAR(255) DECLARE @Command NVARCHAR(4000) DECLARE @RESULT INT IF CHARINDEX ('Select ',LTRIM(@TheSQL))=0 BEGIN RAISERROR ('Usage spSaveTextResultToFile <The SQL Expression> <The Filename)',16,1) RETURN 1 END --firstly we create a global temp table with a unique name SELECT @MySpecialTempTable = '##temp' + CONVERT(VARCHAR(12), CONVERT(INT, RAND() * 1000000)) --then we create it using dynamic SQL, -- SELECT @Command = 'create table [' + @MySpecialTempTable + '] (MyID int identity(1,1), MyLine varchar(MAX)) insert into [' + @MySpecialTempTable + '](MyLine) ' +@TheSQL EXECUTE sp_ExecuteSQL @command --then we execute the BCP to save the file SELECT @Command = 'bcp "select Myline from [' + @MySpecialTempTable + ']' + '" queryout ' + @Filename + CASE WHEN @Unicode=0 THEN '-c' ELSE '-w' END + ' -T -S' + @@servername EXECUTE @RESULT= MASTER..xp_cmdshell @command, NO_OUTPUT EXECUTE ( 'Drop table ' + @MySpecialTempTable ) |
So here we are. We have a range of functions and stored procedures to do the basic file I/O, which you can combine with the routines I provided in the first article. None of this is pretty or elegant, but it is all part of the armoury of the DBA and database developer for those occasional routines where, for one reason or another, there isn’t a better alternative solution.
Load comments